Marker overview

# A tibble: 5 × 8
  target  stage_expressed gene_id_3d7   chrom coords_3d7_amplified seq_length
  <chr>   <chr>           <chr>         <chr> <chr>                     <dbl>
1 *ama1*  blood           PF3D7_1133400 11    1294312-1294613             300
2 *csp*   liver           PF3D7_0304600 03    221351-221640               288
3 *msp7*  blood           PF3D7_1335100 13    1419236-1419567             330
4 *sera2* blood           PF3D7_0207900 02    320762-321022               259
5 *trap*  liver           PF3D7_1335900 13    1465058-1465379             320
# ℹ 2 more variables: gc_content <chr>, pf6k_variant_positions <chr>

Mixtures

Reference strain sequences

Summary of pairwise distances

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   2.000   4.000   4.956   7.000  15.000 

Read summary

# A tibble: 3 × 6
  dens        raw_reads n_reads_dada2     n mean_all_raw mean_all_dada2
  <fct>           <dbl>         <dbl> <int>        <dbl>          <dbl>
1 <1.5 g/µL     4289511       1253629    90       47661.         13929.
2 1.5-75 g/µL   7958631       2577615    90       88429.         28640.
3 ≥75 g/µL     10157092       3391101    84      120918.         40370.

Number of samples that returned usable reads

[1] 257

Per-sample read depth

Number of haplotype occurrences per target

Median number of reads per target

[1] 1542

Amplicon sequences

False positives

Overview numbers

# A tibble: 3 × 6
  hap_type           n_type sum_reads   tot pct_count pct_reads
  <fct>               <int>     <dbl> <int>     <dbl>     <dbl>
1 Expected reference    859   6382389  1292      66.5      88.4
2 Systematic error      254    761330  1292      19.7      10.5
3 Random error          179     78626  1292      13.9       1.1

Sample-level overview

Proportion of reads and haplotype occurrences that are false positive

Summary statistics of false positives

# A tibble: 15 × 6
# Groups:   target [5]
   target  hap_type           n_type n_reads pct_type pct_reads
   <chr>   <fct>               <int>   <dbl>    <dbl>     <dbl>
 1 *ama1*  Expected reference    205 1799151     70.9      95.7
 2 *ama1*  Systematic error       30   51043     10.4       2.7
 3 *ama1*  Random error           54   30340     18.7       1.6
 4 *csp*   Expected reference    225 2313875     77.6      87.4
 5 *csp*   Systematic error       40  323921     13.8      12.2
 6 *csp*   Random error           25   10152      8.6       0.4
 7 *msp7*  Expected reference    201  541878     74.4      78.2
 8 *msp7*  Systematic error       53  145442     19.6      21  
 9 *msp7*  Random error           16    5536      5.9       0.8
10 *sera2* Expected reference    175 1326908     65.8      87.7
11 *sera2* Systematic error       63  179454     23.7      11.9
12 *sera2* Random error           28    7102     10.5       0.5
13 *trap*  Expected reference     53  400577     29.9      82.2
14 *trap*  Systematic error       68   61470     38.4      12.6
15 *trap*  Random error           56   25496     31.6       5.2
# A tibble: 3 × 6
# Groups:   dens [3]
  dens        hap_type     n_type n_reads pct_type pct_reads
  <fct>       <fct>         <int>   <dbl>    <dbl>     <dbl>
1 <1.5 g/µL   Random error     41   40011     16.1       3.2
2 1.5-75 g/µL Random error     64   19844     12.3       0.8
3 ≥75 g/µL    Random error     74   18771     14.3       0.6

Number of reads supporting true positives vs false positives

# A tibble: 1 × 3
  median_in median_not_in wilcox_p
      <dbl>         <dbl>    <dbl>
1      2393           104 1.32e-70

Read depth of false positives

Characteristics of false positives

Identifying optimal thresholds

Sensitivity-specificity plots

Optimal thresholds table

# A tibble: 9 × 6
  metric      threshold     ci025    median    ci975 type 
  <chr>           <dbl>     <dbl>     <dbl>    <dbl> <chr>
1 threshold   275       204.      275       420.     depth
2 specificity   0.749     0.676     0.749     0.821  depth
3 sensitivity   0.950     0.917     0.952     0.972  depth
4 threshold     0.00713   0.00521   0.00808   0.0139 prop 
5 specificity   0.525     0.458     0.559     0.682  prop 
6 sensitivity   0.970     0.903     0.964     0.987  prop 
7 threshold     0.208     0.0895    0.208     0.361  ratio
8 specificity   0.671     0.443     0.671     0.848  ratio
9 sensitivity   0.817     0.718     0.831     0.930  ratio

Number of haplotype occurrences and haplotypes pre- and post-censoring

[1] 1292
[1] 975
[1] 0.754644
[1] 124
[1] 59
[1] 0.4758065

Number of samples that returned no haplotypes post-censoring

[1] 3

Upset plot of how haplotypes are censored

Summary of censored and uncensored haplotypes

# A tibble: 6 × 5
# Groups:   hap_type [3]
  censored     hap_type           count   tot   pct
  <chr>        <fct>              <int> <int> <dbl>
1 Censored     Expected reference    67   859     8
2 Censored     Systematic error     102   254    40
3 Censored     Random error         148   179    83
4 Not censored Expected reference   792   859    92
5 Not censored Systematic error     152   254    60
6 Not censored Random error          31   179    17

Summary of how haplotypes were censored

# A tibble: 4 × 4
# Groups:   name [4]
  name            n_fps     n   pct
  <chr>           <int> <int> <dbl>
1 lendiff           179    50    28
2 prop_lt_thresh    179    97    54
3 ratio_lt_thresh   179    53    30
4 reads_lt_thresh   179   134    75

Comparison of haplotypes that passed and did not pass censoring

Censoring results by density

Censoring of false positive haplotypes by density

# A tibble: 3 × 5
  dens      No   Yes   tot  prop
  <fct>  <int> <int> <int> <dbl>
1 <1.5      11    13    24 0.458
2 1.5-75     6    39    45 0.133
3 ≥75        0    54    54 0    

    Fisher's Exact Test for Count Data

data:  .
p-value = 1.964e-07
alternative hypothesis: two.sided

Censored true positives

Characteristics of censored true positives

# A tibble: 3 × 4
# Groups:   dens [3]
  dens     tot     n  prop
  <fct>  <int> <int> <dbl>
1 <1.5      67    13 0.194
2 1.5-75    67    15 0.224
3 ≥75       67    39 0.582
# A tibble: 2 × 4
# Groups:   ref_pct >= 10 [2]
  `ref_pct >= 10`   tot     n   prop
  <lgl>           <int> <int>  <dbl>
1 FALSE              67    62 0.925 
2 TRUE               67     5 0.0746

Replicate variability

Percent agreement across replicates

# A tibble: 3 × 4
# Groups:   n [3]
      n   tot    nn  prop
  <int> <int> <int> <dbl>
1     1   416    99 0.238
2     2   416    75 0.180
3     3   416   242 0.582
# A tibble: 9 × 5
# Groups:   dens, n [9]
  dens            n   tot    nn  prop
  <fct>       <int> <int> <int> <dbl>
1 <1.5 g/µL       1   109    51 0.468
2 <1.5 g/µL       2   109    25 0.229
3 <1.5 g/µL       3   109    33 0.303
4 1.5-75 g/µL     1   163    32 0.196
5 1.5-75 g/µL     2   163    31 0.190
6 1.5-75 g/µL     3   163   100 0.613
7 ≥75 g/µL        1   144    16 0.111
8 ≥75 g/µL        2   144    19 0.132
9 ≥75 g/µL        3   144   109 0.757

Pairwise Jaccard distance of replicates

# A tibble: 1 × 2
  median_jac iqr_jac
       <dbl>   <dbl>
1      0.833     0.5
# A tibble: 3 × 3
  dens   median_jac iqr_jac
  <fct>       <dbl>   <dbl>
1 <1.5        0.5      0.75
2 1.5-75      0.833    0.5 
3 ≥75         1        0.2 

Missing haplotypes

Number of expected haplotypes that were missing

# A tibble: 2 × 4
# Groups:   found [2]
  found   tot     n  prop
  <chr> <int> <int> <dbl>
1 No     1365   477 0.349
2 Yes    1365   888 0.651

Found and missing haplotypes by reference percent

Proportion missing by read depth

Correlation between reference and read proportion

Replicate missingness by reference proportion

Number of replicates in which haplotype was found

# A tibble: 12 × 5
# Groups:   dens [3]
   dens        in_n_rep count   tot   pct
   <fct>          <int> <int> <int> <dbl>
 1 ≥75 g/µL           0    30   166    18
 2 ≥75 g/µL           1     5   166     3
 3 ≥75 g/µL           2    14   166     8
 4 ≥75 g/µL           3   117   166    70
 5 1.5-75 g/µL        0    62   202    31
 6 1.5-75 g/µL        1    20   202    10
 7 1.5-75 g/µL        2    27   202    13
 8 1.5-75 g/µL        3    93   202    46
 9 <1.5 g/µL          0    78   158    49
10 <1.5 g/µL          1    34   158    22
11 <1.5 g/µL          2    21   158    13
12 <1.5 g/µL          3    25   158    16

Missingness risk factors

# A tibble: 9 × 4
  feature  term            bivariate                    multivariate            
  <chr>    <chr>           <chr>                        <chr>                   
1 ref_prop ref_pct         0.98 (0.97-0.98); p=4.7e-11  0.96 (0.96-0.97); p=6e-…
2 target   target*csp*     0.76 (0.49-1.18); p=0.23     0.9 (0.54-1.51); p=0.7  
3 target   target*msp7*    1.24 (0.81-1.89); p=0.32     0.4 (0.23-0.68); p=8e-04
4 target   target*sera2*   1.68 (1.1-2.57); p=0.016     1.05 (0.63-1.77); p=0.8 
5 target   target*trap*    21.37 (13.02-35.08); p=1e-33 6.13 (3.13-12.03); p=1e…
6 density  dens1.5-75 g/µL 1.62 (0.76-3.45); p=0.21     1.47 (0.75-2.88); p=0.3 
7 density  dens<1.5 g/µL   6.27 (2.87-13.67); p=3.9e-06 3.88 (1.82-8.27); p=5e-…
8 reads    reads_10000     0.57 (0.53-0.62); p=5.7e-40  0.61 (0.54-0.69); p=3e-…
9 moi      expected_moi    0.32 (0.24-0.44); p=1.2e-12  1.08 (0.91-1.27); p=0.4 

MOI & clinical samples

Observed vs expected MOI

# A tibble: 3 × 4
# Groups:   obs_min_exp_moi_cat [3]
  obs_min_exp_moi_cat    tot     n   pct
  <chr>                <int> <int> <dbl>
1 Higher than expected   254    26    10
2 Lower than expected    254   154    61
3 Same as expected       254    74    29
# A tibble: 1 × 4
  median_low median_mid median_high      wilcox_p
       <dbl>      <dbl>       <dbl>         <dbl>
1         -4         -1          -1 0.00000000221

Clinical samples

# A tibble: 2 × 4
# Groups:   censored [2]
  censored   tot     n   pct
  <lgl>    <int> <int> <dbl>
1 FALSE      142   106    75
2 TRUE       142    36    25